Even with the introduction of the ggplot2
package, several R users still rely on base R (i.e. R without any user-installed packages) to create their plots. While the charts produced tend to be less fancy than their ggplot2
counterparts, the syntax for base R can be very succinct. This is helpful when you are just exploring the data and are not too fussed about presentation.
For this document, we’ll use the built-in mtcars
datset as a running example.
data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Let’s say we are interested in the relationship between mpg
and wt
. We can make a scatterplot using the plot
command, defining the x
and y
arguments of the function. (Recall that data frames are really just lists, so mtcars$wt
refers to the wt
element of mtcars
, i.e. the values in the wt
column.)
plot(x = mtcars$wt, y = mtcars$mpg)
To have the points be represented by other shapes (instead of white circles), add the pch
argument to plot
(full list of shapes here):
plot(x = mtcars$wt, y = mtcars$mpg, pch = 5)
To change the size of the points, add the cex
option to plot
(1 is the default value):
plot(x = mtcars$wt, y = mtcars$mpg, pch = 7, cex = 2)
In some cases (e.g. time series data), we may want lines joining the data points instead of showing just the points themselves. To have lines instead of points, add type = "l"
to the plot
command. (Other options for type
are “o”, “p” and “b”. Try them!)
plot(x = mtcars$wt, y = mtcars$mpg, type = "l")
(Drawing lines doesn’t really make sense in this context, so the above is simply for illustration.) We can have different types of lines by adding an lty
option to plot (see here for more line options):
plot(x = mtcars$wt, y = mtcars$mpg, type = "l", lty = "dashed")
For different line widths, use lwd
:
plot(x = mtcars$wt, y = mtcars$mpg, type = "l", lwd = 2)
To change the color of the points, use the col
option:
plot(x = mtcars$wt, y = mtcars$mpg, pch = 16, col = "blue")
Like ggplot2
, we can make the color of the point depend on which category it is in. Let’s say we want to color the points depending on the value of cyl
. We first convert cyl
to a factor, then modify the value of col
in the plot
call:
mtcars$cyl <- factor(mtcars$cyl)
plot(x = mtcars$wt, y = mtcars$mpg, pch = 16, col = factor(mtcars$cyl))
To add a legend, follow the plot
call with a legend
call. The x
and y
options determine the top-left hand corner of the legend box. (Use the console to figure out what levels(mtcars$cyl)
returns. Notice how you have to specify col
and pch
in the legend
call as well. What happens if you don’t include them?)
plot(x = mtcars$wt, y = mtcars$mpg, pch = 16, col = mtcars$cyl)
legend(x = 5, y = 32, legend = levels(mtcars$cyl), col = c(1:3), pch = 16)
The code below shows how you can add titles and change the axis labels:
plot(x = mtcars$wt, y = mtcars$mpg,
main = "Miles per gallon vs. Weight", xlab = "Weight", ylab = "mpg")
To change the size of the title and the axis labels, use the cex.main
and cex.axis
options respectively.
A histogram shows the frequency count of one variable. To plot a histogram, use the hist
command:
hist(mtcars$mpg)
The number of bins is determined by an algorithm that R runs. If you want to specify the number of bins, you can use the breaks
option and give it a number:
hist(mtcars$mpg, breaks = 10)
Because of R’s algorithm for determining the number of bins, sometimes the number of bins you get doesn’t correspond exactly to the number you gave to breaks
. To have exact control over this, instead of giving breaks
an integer, you could give it a vector of “breakpoints” instead. For example, the code below bins the values into (10, 12], (12, 14], …, (32, 34]. (Type ?seq
to read the documentation for the seq
function and figure out what it returns.)
hist(mtcars$mpg, breaks = seq(10, 34, by = 2))
To make a boxplot, use boxplot
:
boxplot(mtcars$mpg)
To make a boxplot for each category of cyl
(the syntax is a little bit like that for facet_wrap
and facet_grid
in ggplot2
):
boxplot(mtcars$mpg ~ mtcars$cyl)
Notice how the numbers on the y-axis are rotated. To make them as the numbers on the x-axis, use the las
option:
boxplot(mtcars$mpg ~ mtcars$cyl, las = 1)
If I want a bar plot showing how many rows there are for each value of cyl
, I have to use the table
function in conjunction with the barplot
function. (What do you get if you use plot
instead of barplot
?)
table(mtcars$cyl)
##
## 4 6 8
## 11 7 14
barplot(table(mtcars$cyl))
Plotting in base R can be very quick, even though the syntax may be harder to interpret and the outputs may look less professional.
Some other resources if you are interested in learning more about plotting in base R: